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ABSTRACT 

l-Nearest Neighbor with the Dynamic Time Warping (DTW) 
distance is one of the most effective classifiers on time series 
domain. Since the global constraint has been introduced in 
speech community, many global constraint models have been 
proposed including Sakoe-Chiba (S-C) band, ftakura Paral- 
lelogram, and Ratanamahatana-Keogh (R-K) band. The 
R-K band is a general global constraint model that can rep- 
resent any global constraints with arbitrary shape and size 
effectively. However, we need a good learning algorithm to 
discover the most suitable set of R-K bands, and the current 
R-K band learning algorithm still suffers from an 'overfit- 
ting' phenomenon. In this paper, we propose two new learn- 
ing algorithms, i.e., band boundary extraction algorithm and 
iterative learning algorithm. The band boundary extraction 
is calculated from the bound of all possible warping paths 
in each class, and the iterative learning is adjusted from the 
original R-K band learning. We also use a Silhouette index, 
a well-known clustering validation technique, as a heuris- 
tic function, and the lower bound function, LELKeogh, to 
enhance the prediction speed. Twenty datasets, from the 
Workshop and Challenge on Time Series Classification, held 
in conjunction of the S1GKDD 2007, are used to evaluate 
our approach. 
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1. INTRODUCTION 

Classification problem is one of the most important tasks in 
time series data mining. A well-known f-Nearest Neighbor 
(f-NN) with Dynamic Time Warping (DTW) distance is 
one of the best classifier to classify time series data, among 
other approaches, such as Support Vector Machine (SVM) 
[9], Artificial Neural Network (ANN) [3], and Decision Tree 
®. 

For the f-NN classification, selecting an appropriate distance 
measure is very crucial; however, the selection criteria still 
depends largely on the nature of data itself, especially in 
time series data. Though the Euclidean distance is com- 
monly used to measure the dissimilarity between two time 
series, it has been shown that DTW distance is more ap- 
propriate and produces more accurate results. Sakoe-Chiba 
Band (S-C Band) [Hj originally speeds up the DTW calcu- 
lation and later has been introduced to be used as a DTW 
global constraint. In addition, the S-C Band was first im- 
plemented for the speech community, and the width of the 
global constraint was fixed to be f 0% of time series length. 
However, recent work [5] reveals that the classification accu- 
racy depends solely on this global constraint; the size of the 
constraint depends on the properties of the data at hands. 
To determine a suitable size, all possible widths of the global 
constraint are tested, and the band with the maximum train- 
ing accuracy is selected. 

Ratanamahatana-Keogh Band (R-K Band) [4] has been in- 
troduced to generalize the global constraint model repre- 
sented by a one-dimensional array. The size of the array 
and the maximum constraint value is limited to the length 
of the time series. And the main feature of the R-K band 
is the multi bands, where each band is representing each 
class of data. Unlike the single S-C band, this multi R-K 
bands can be adjusted as needed according to its own class' 
warping path. 

Although the R-K band allows great flexibility to adjust 
the global constraint, a learning algorithm is needed to dis- 
cover the 'best' multi R-K bands. In the original work of 
R-K Band, a hill climbing search algorithm with two heuris- 
tic functions (accuracy and distance metrics) is proposed. 
The search algorithm climbs though a space by trying to 
increase/decrease specific parts of the bands until terminal 
conditions are met. However, this learning algorithm still 
suffers from an 'overfitting' phenomenon since an accuracy 
metric is used as a heuristic function to guide the search. 



To solve this problem, we propose two new learning algo- 
rithms, i.e., band boundary extraction and iterative learn- 
ing. The band boundary extraction method first obtains a 
maximum, mean, and mode of the path's positions on the 
DTW distance matrix, and the iterative learning, band's 
structures are adjusted in each round of the iteration to a 
Silhouette Index [7]. We run both algorithms and the band 
that gives better results. In prediction step, the 1-NN us- 
ing Dynamic Time Warping distance with this discovered 
band is used to classify unlabeled data. Note that a lower 
bound, LBJKeogh [2], is also used to speed up our 1-NN 
classification. 




Figure 1: DTW without using global constraint may 
introduce an unwanted warping. 



The rest of this paper is organized as follows. Section 2 gives 
some important background for our proposed work. In Sec- 
tion 3, we introduce our approach, the two novel learning 
algorithms. Section 4 contains an experimental evaluation 
including some examples of each dataset. Finally, we con- 
clude this paper in Section 5. 

2. BACKGROUND 

Our novel learning algorithms are based on four major fun- 
damental concepts, i.e., Dynamic Time Warping (DTW) 
distance, Sakoe-Chiba band (S-C band), Ratanamahatana- 
Keogh band (R-K band), and Silhouette index, which are 
briefly described in the following sections. 

2.1 Dynamic Time Warping Distance 

Dynamic Time Warping (DTW) [5] distance is a well-known 
similarity measure based on shape. It uses a dynamic pro- 
gramming technique to find all possible warping paths, and 
selects the one with the minimum distance between two time 
series. To calculate the distance, it first creates a distance 
matrix, where each element in the matrix is a cumulative 
distance of the minimum of three surrounding neighbors. 
Suppose we have two time series, a sequence Q of length 
n (Q = qi, Q2, ■ ■ ■ , Qi, ■ ■ ■ , <?») and a sequence C of length 
m (C = ci, C2, . . . , Cj, . . . , c m ). First, we create an n-by-m 
matrix, where every (i, j) element of the matrix is the cumu- 
lative distance of the distance at (i, j) and the minimum of 
three neighboring elements, where 1 < i < n and 1 < j i < m. 
We can define the element, of the matrix as: 



H,j =<k,j +min{7i_ij,7ij_i, 7^1,^1} 



(1) 



where dij — (a — qj) is the squared distance of qi and 
Cj, and 7;^ is the summation of dij and the the minimum 
cumulative distance of three elements surrounding the 
element. Then, to find an optimal path, we choose the path 
that yields a minimum cumulative distance at (n, m), which 
is defined as: 



Ddtw(Q,C) = min 
Vwep 



\ 



(2) 



where P is a set of all possible warping paths, Wk is at 
k th element of a warping path, and K is the length of the 
warping path. 




Figure 2: Global constraint examples of (a) R-K 
band (b) S-C band, and (c) Itakura Parallelogram. 



In reality, DTW may not give the best mapping according 
to our need because it will try its best to find the minimum 
distance. It may generate the unwanted path. For example, 
in Figure [l] 5 , without global constraint, DTW will find its 
optimal mapping between the two time series. However, in 
many cases, this is probably not what we intend, when the 
two time series are expected to be of different classes. We 
can resolve this problem by limiting the permissible warping 
paths using a global constraint. Two well-known global con- 
straints, Sakoe-Chiba band and Itakura Parallelogram [T], 
and a recent representation, Ratanamahatana-Keogh band 
(R-K band), have been proposed, Figure [2]! 4] shows an ex- 
ample for each type of the constraints. 

2.2 Sakoe-Chiba Band 

Sakoe-Chiba band (S-C band), shown in Figure [2] (b), is one 
of the simplest and most popular global constraints, orig- 
inally introduced to be used for speech community. The 
width of this global constraint is generally set to be 10% of 
the time series length. However, recent work [5] has shown 
that the different sizes of the band can be used towards 
a more accurate classification. We therefore need to test 
all possible widths of the global constraint so that the best 
width could be discovered. An evaluation function is needed 
to justify the selection. We commonly use accuracy metric 



Table 1: Finding the best warping window. 

Function [best_band] = BestWarping [T] 



1 best_evaluate — NegativeInfinite; 

2 for (k = 100 to 0) 

3 bandk — S-C band at k% width; 

4 evaluate — evaluatc(6anc£fc); 

5 if [evaluate >— best_evaluate) 

6 best_evaluate — evaluate; 

7 best_band — band^ 

8 endif 

9 endfor 



(a training accuracy) as a measurement. Table [T] shows an 
algorithm in finding the best warping window for S-C band 
by decreasing the band size by 1% in each step. This func- 
tion receives a set of data T as an input, and gives the best 
warping window {besL_band) as an output. Note that if an 
evaluation value is equal to the best evaluation value, we 
prefer the smaller warping window size. 

2.3 Ratanamahatana-Keogh Band 

Ratanamahatana-Keogh band (R-K band) 4 is a general 
model of a global constraint specified by a one-dimensional 
array 7?, i.e., R = r%, r%, . . . , rt, . . . , r n where n is the length 
of time series, and Ti is the height above the diagonal in 
y direction and the width to the right of the diagonal in x 
direction. Each Vi value is arbitrary, therefore R-K band is 
also an arbitrary-shape global constraint, as shown in Figure 
[2] (a). Note that when ri = 0, where 1 < i < n, this R-K 
band represents the Euclidean distance, and when n = n, 
where 1 < i < n, this R-K band represents the original 
DTW distance with no global constraint. The R-K band 
is also able to represent the S-C band by giving all rt = c, 
where c is the width of a global constraint. Moreover, the R- 
K band is a multi band model which can be effectively used 
to represent one band for each class of data. This flexibility 
is a great advantage; however, the higher the number of 
classes, the higher the time complexity, as we have to search 
through such a large space. 

Since determining the optimal R-K band for each training 
set is highly computationally intensive, a hill climbing and 
heuristic functions have been introduced to guide which part 
of space should be evaluated. A space is defined as a seg- 
ment of a band to be increased or decreased. In the original 
work, two heuristic functions, accuracy metric and distance 
metric, are used to evaluate a state. The accuracy metric is 
evaluated from the training accuracy using leaving-one-out 
1-NN, and the distance metric is a ratio of the mean DTW 
distances of correctly classified and incorrectly classified ob- 
jects. However, these heuristic functions do not reflect the 
true quality of a band because empirically, we have found 
that the resulting bands tend to 'overfit' the training data. 

Two searching directions are considered, i.e., forward search, 
and backward search. In forward search, we start from the 
Euclidean distance (all ri in R equal to 0), and parts of 
the band are gradually increased in each searching step. In 
the case where two bands have the same heuristic value, the 
wider band is selected. On the other hand, in backward 
search, we start from a very large band (all ri in R equal to 



Table 2: The pseudo code for multiple R-K bands 
learning. 

Function [band] = Learning [T, threshold] 



1 


JV= size of T; 


2 


L= length of data in T; 


3 


initialize bandi for i — 1 to c; 


4 


forcachclass i — 1 to c 


5 


cnqucuc(l, L, Queuei); 


6 


endfor 


7 


best_evaluate — cvaluatc(T, band): 


8 


while !empty (Queue) 


9 


forcachclass i = 1 to c 


10 


if lempty (Queuei) 


11 


[start, end] — dcqucuc(Queuei) 


12 


adjustable — &d}\ist(bandi, start, end); 


13 


if adjustable 


14 


evaluate— cvaluatcfT, band); 


15 


if evaluate > best^evaluate 


16 


best_evaluate — evaluate; 


17 


cnqucuc(siar£, end, Queuei); 


18 


else 


19 


undo_adjustmcnt(6andi, start, end); 


20 


if (start - end) / 2 > threshold 


21 


cnqucuc(stari, mid-1, Queuei); 


22 


enqueue (mid, end, Queuei); 


23 


endif 


24 


endif 


25 


endif 


26 


endif 


27 


endfor 


28 


endwhile 



n, where n is the length of time series) , and parts of the band 
are gradually decreased in each searching step. If two bands 
have the same heuristic value, the tighter band is chosen. 

Our learning algorithm starts from first enqueuing the starting- 
and ending-parts of the R-K Band. In each iteration, these 
values are dequeued, and used as a boundary for a band in- 
crease/decrease. And then the adjusted band is evaluated. 
If the heuristic value is higher than the current best heuristic 
value, the same start and end values are enqueued. If not, 
this part is further divided into two equal subparts before 
being enqueued, as shown in Figure [3] [4]. The iterations 
are continued until a termination condition is met. Table 2 
shows the pseudo code for this multiple R-K bands learning. 

2.4 Silhouette Index 

Silhouette index (SI) [7] or Silhouette width is a well-known 
clustering validity technique, originally used to determine 
a number of clusters in a dataset. This index measures 
the 'quality' of separation and compactness of a clustered 
dataset, so the number of cluster is determined by selecting 
the number that gives maximum index value. 

The Silhouette index is based on a compactness-separation 
measurement which consists of an inter-cluster distance (a 
distance between two different-cluster data) and an intra- 
cluster distance (a distance between two same-cluster data). 
A good clustered dataset means that the dataset has high 
inter-cluster distance and low intra-cluster distance. In other 
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Figure 3: An illustration of the concept in R-K band 
forward searching algorithm. 



words, a good clustered dataset is the dataset that different- 
cluster are well separated, and the same-cluster data are well 
grouped together. The Silhouette index for each data i is 
defined by the following equation: 



s(i) 



b(i) — a(i) 
max {b(i),a(i)} 



b(i) = min 



a(i) 



N J 



D, 



h3. 



(3) 



(4) 



(5) 



where s(i) is the Silhouette index of i th data, b(i) is the 
minimum average distance between the i th data and each of 
the different-cluster data, and a(i) is the average distance 
between the i th data and each of the same-cluster data. In 
Equations @ and ©, C is a set of all possible clusters, 
D c is a set of data in cluster c, Nd c is the size of D c , and 
d(i,j) is the distance measure function comparing between 
i th and j th data. Note that the s(i) ranges between -1 and 
1. Having s(i) close to 1 means that data is well separated. 
Global Silhouette index (GS) for a dataset is calculated as 
follows. 



Table 3: Silhouette index function. 

Function [index] = Silhouette [D] 



1 


N= size of D; 


2 


sum^All — 0; 


3 


forcachclass j = 1 to c 


4 


M — size of Dj; 


5 


sum_Class — 0; 


6 


for i = 1 to M 


7 


b = b(i): 


8 


a — a(i); 


9 


8 = (b — a) j max(t), a); 


10 


sum_Class +— s; 


11 


cndfor 


12 


sum^All +— sum_Class j M; 


13 


cndfor 


14 


index— sum_All j c; 



where c is the number of clusters, S c is the Silhouette index 
for cluster c, and M is the number of data in cluster c. The 
pseudo code for the Silhouette index function is shown in 
Table [3] 

3. METHODOLOGY 

In this section, we describe our approach, developed from the 
techniques described in Section 2, i.e., the DTW distance, 
the best warping window for Sakoe-Chiba band, multiple R- 
K bands, and the Silhouette index. In brief, our approach 
consists of 5 major parts: 1) data preprocessing that re- 
duces the length of time series data, 2) our proposed band 
boundary extraction algorithm, 3) finding the best warp- 
ing window for Sakoe-Chiba band, 4) our proposed iterative 
R-K band learning, and 5) prediction for unlabeled data. 

Our approach requires three input parameters, i.e., a set of 
training data T, a set of unlabeled data (test data) P, the 
maximum complexity that depends on time and computa- 
tional resources, and the bound of a warping window size. 
In data preprocessing step, we could reduce the computa- 
tional complexity in case of very long time series data using 
interpolation function, both in training and test data. After 
that, we try to find the best R-K band by running the band 
boundary extraction algorithm. The best warping window is 
calculated and is used as an initial band of our proposed iter- 
ative learning. After learning have finished, two R-K bands 
are compared and the better one is selected. Finally, we 
calculate a training accuracy and make predictions for the 
test data using 1-NN with the DTW distance and the best 
band, enhanced with LBJKeogh lower bound to speed up 
our classification approach. The prediction result A along 
with the training accuracy are returned as shown in Table 

a 



3.1 Data Preprocessing 

_ ly^^ /g-. Since the classification prediction time may because a ma- 

3 jor constraint, a data preprocessing step is needed. In this 

step, we approximate the calculation complexity and try to 
reduce the complexity exceeding the threshold. Our approx- 
M imated complexity is mainly based on the number of items 

g _ _Ly^ s ^ (7) in the training data, its length, and the number of heuristic 

M ^ function evaluations. Suppose we have n training data with 



Table 4: Our classification approach. 

Function [A, accuracy] = OurApproach[T, P, complexity, bound] 



Table 6: Boundary Band Extraction Algorithm. 

Function [best_band, bestjieuristic] = BandExtraction[T] 



1 L— length of data in T; 

2 [T , L, P] — preproccss(T, P, L, complexity); 

3 [best_band, bestjieuristic] — band_cxtraction(T); 

4 R— bcst_warping(T, bound): 

5 [band, heuristic] — iterative_learning(T, R, L, bound); 

6 if {heuristic > best_heuristic) 

7 best_heuristic — heuristic; 

8 best-band — band; 

9 endif 

10 accuracy— lcave_one_out(T, best^band); 

11 A= prcdict(T, P, best_band); 



Table 5: Data preprocessing step. 

Function [NewT, NewP, NewL] = PreProcess[T, P, L, threshold] 

1 alpha— complexity (T, L); 

2 sot NewT = T, NewP = P, and NewL = L; 

3 while {alpha > threshold) 



4 NewL - NewL / 2; 

5 NewT = intcrpolatc(A r eu;T, NewL); 

6 NewP — interpolate( NewP, NewL); 

7 alpha — complexity {NewT, NewL); 



8 endwhile 



m data points in length, we can calculate the complexity by 
the following equation. 



complexity (n,m) = log(n x m ) 

where a logarithm function is added to bring down the value 
to a more manageable range for users. 

To decrease the complexity, we could reduce the length of 
each individual time series by using typical interpolation 
function. The new length of time series is set to be the 
current length divided by two. We keep reducing the time 
series length until the complexity is smaller than the user's 
defined threshold. Table [5] shows the preprocessing steps on 
a set of training data T, a set of unlabeled data P, the origi- 
nal length L, and the complexity threshold. In this work, we 
set this threshold value to 9, according to resources and the 
time constraint for this 24-hour Workshop and Challenge on 
Time Series Classification. 

3.2 Boundary Band Extraction 

Since the multi R-K band model allows a learning algo- 
rithm to create a different band for each different class. This 
boundary band extraction algorithm is derived from a simple 
intuition for each of the same-class data, we first calculate 
all their DTW distances, save all the warping paths, and 
plot those paths on a matrix (called a path matrix). After 
that, we will determine an appropriate R-K band. For each 
ri of this R-K band, the r< value is set to be the maximum 
between height above the diagonal in y direction and width 
to the right of the diagonal in x direction in the path ma- 
trix. We repeat these steps to every possible class in the 



1 N= size of T; 

2 L— length of data in T; 

3 initialize path_matrix — new array [L][L]; 

4 initialize R for Max, Mean, and Mode 

5 foreachclass {k = 1 to c) 

6 Nk = size of Tk; 

7 for (i = 1 to N) 

8 for {j = 1 to N) 

9 if {i != j) 

10 Path = dtw_path(i, j); 

11 for (all point p in Path) 

12 path_matrix[p.x][p.y]-\ — f-; 

13 endfor 

14 endif 

15 endfor 

16 endfor 

17 for (i = to L) 

18 Maxk[i] — maximum warping path at 

19 Meank [i] — mean warping path at ri 

20 Modek[i] — mode warping path at r« 

21 endfor 

22 end 

23 [best_band, best_heuristic] — bcstbaiid(Ma:c, Mean, Mode); 



dataset; we call this R-K band a MaxBand. Similarly, the 
band extraction is performed using mean average and mode 
instead of the maximum, resulting in a MeanBand and a 
ModeBand, respectively. Figure [3] illustrates the steps in 
creating a MaxBand. From these calculations, three mul- 
tiple R-K bands are generated. The evaluation function is 
used to select the best band to be returned as an output of 
this algorithm. Table [6] shows the band boundary extraction 
algorithm on a set of training data T and return the best 
R-K band and the best heuristic value. 

3.3 Finding the Best Warping Window 

In this step, we try to achieve the best warping window 
of Sakoe-Chiba band to be an input of our proposed itera- 
tive R-K band learning. This function is slightly different 




(b) (c) 

Figure 4: The illustration of MaxBand calculation, 
(a) finding all possible warping paths, (b) plotting all 
paths in a path matrix, and (c) calculating maximum 
value for each ri 



Table 7: A search algorithm for the best S-C band 
warping window. 

Function [R] = BestWarping[T, bound] 



1 


JV= size of T; 


2 


bestjieuristic = NegativeInfinite; 


3 


for (k — bound to 0) 


4 


bandk— S-C Band at fc% width; 


5 


heuristic— cvaluatc(bandfc) ; 


6 


if (heuristic > best_heuristic) 


7 


bestjieuristic — heuristic; 


8 


R = k; 


9 


endif 


10 


endfor 



Table 8: Our proposed Iterative R-K band learning 
algorithm. 

Function [band] = IterativeLearning[T, R, L, bound] 



1 initialize best_bandi for i — to c equals to i?% of L 

2 threshold = L / 2: 

3 bestjieuristic — cvaluatc(T, band); 

4 while (threshold) < 1 

5 fw_band — forward_lcarning( T, L, band, bound); 

6 bw_band — backward_lcarning(T, L, band, bound); 

7 fwjieuristic — evaluate(T, fw_band); 

8 bwjieuristic — cvaluatc(T, bw_band); 

9 band — maximum heuristic value band; 

10 heuristic — maximum heuristic value; 

11 if heuristic — best-heuristic 

12 threshold — threshold / 2; 

13 endif 

14 endwhile 



from the original one in that we bound the maximum width 
of the warping window and we use our evaluation function 
(heuristic function) instead of the typical training accuracy. 
A simple pseudo code is described in Table [7] below. A set 
of training data T and a maximum warping window size are 
required in discovering the best warping window R. 

3.4 Iterative Band Learning 

The iterative R-K band learning is extended from the origi- 
nal learning that it will repeat the learning again and again 
until a heuristic value no longer increases. In the first step, 
we initialize all the multi R-K bands with R% Sakoe-Chiba 
band, where R is the output from the best finding warp- 
ing window algorithm. We also set a learning threshold to 
be half of the time series length, and the initial bands are 
evaluated 

In each iteration, our proposed algorithm learns a new R-K 
band starting with the previous R-K band learning result 
both in forward and backward direction. We also run both 
forward and backward learning and select the best band 
which gives a higher heuristic value. If the heuristic value is 
the same as the best heuristic value, the threshold is divided 
by two; otherwise we update the best heuristic value. We 
repeat these steps until the threshold falls below 1. Table 
[5] shows our proposed algorithm, iterative R-K band learn- 
ing, which requires a set of training data T, a best warping 



Table 9: Our proposed R-K band learning algo- 
rithm. 

Function [R] = ProposedLearning[T, threshold, band, bound] 



1 


L— length of data in T; 


2 


foreachclass i - 1 to c 


3 


cnqucuc(l, L, i, Queue); 


4 


endfor 


5 


bcst_hcuristic — cvaluatc(T, band); 


6 


while !cmpty(Queuc) 


7 


[start, end, label] — randomly_dcqucuc(Qucuc) 


8 


adjustable — adjust (bandlabel, start, end, bound); 


9 


if adjustable 


10 


heuristic — evaluatc(T, band); 


11 


if heuristic > bcst_hcuristic 


12 


best_hcuristic — heuristic; 


13 


enqueuefstart, end, label, Queue); 


14 


else 


15 


undo_adjustment(bandlabcl. start, end); 


16 


if (start - end) / 2 > threshold 


17 


cnqucuc(start, mid-1, label, Queue); 


18 


enqucue(mid, end, label, Queue); 


19 


endif 


20 


endif 


21 


endif 


22 


endwhile 



window R, the length of time series L, and the bound of 
warping window. 

We have modified the original multi R-K bands learning by 
changing its data structure. We replace multi queues, which 
are originally assigned for each class by only one single queue 
with an addition parameter label to each start-end object. 
This new queue will draw an object randomly instead of last- 
in-first-out (LIFO) manner. In addition, we also change an 
adjustment function by adding a new parameter bound to 
limit forward learning not to increase the band's size exceed- 
ing limited bound. Table [9] shows the proposed R-K band 
learning on a set of training data T, a learning threshold, an 
initial band, and the bound of warping window. 



3.5 Evaluation Function 

From Section 2.4, we have briefly described the utility and 
the algorithm of the Silhouette index. This index is com- 
monly used to measure the quality of a clustered dataset; 
however, we can utilize this Silhouette index as a heuris- 
tic function to measure the quality of a distance measure as 
well. The DTW distance with multi R-K bands is a distance 
measure that requires one additional parameter, Band, spec- 
ifying the R-K band to be used (since the multi R-K bands 
contain one band for each class). Table [TOl shows the evalu- 
ation function derived from the original Silhouette index. 



b(i, Band) — a(i, Band) 
max {b(i, Band), a(i, Band)} 



Table 10: An evaluation (heuristic) function. 

Function [index] = Evaluate[D,B] 



1 


N= size of D; 


2 


sum^All — 0; 


3 


forcachclass (j — 1 to c) 


4 


M — size of Dj ; 


5 


sum_Class — 0; 


6 


for (i = 1 to M) 


7 


b = b(i, B); 


8 


a — a(z, B); 


9 


s — (6 — a) 1 max(b, a); 


10 


sum_Class +— s; 


11 


cndfor 


12 


sum_ALl +— sum_Class j M; 


13 


cndfor 


14 


index— sum_All / c; 



b(i, Band) = min 

ceCAc^ilabel(i) \ Nd 



— d ( i 'j' Band lo.bel(j)) ) 

Dc jeD c j 



O) 



a(i, Band) 



/] d(i,j,Bandi a keiu)) (10) 



3.6 Data Prediction 

After the best multi R-K bands are discovered, we use 1- 
Nearest Neighbor as a classifier and the Dynamic Time Warp- 
ing distance measure with these best R-K bands for predic- 
tion in the test data to predict a set of unlabeled data. The 
LBJKeogh lower bound is also used to speed up the DTW 
computation. 

4. EXPERIMENTAL EVALUATION 

To evaluate the performance, we use our approach, described 
in Section 3, to classify all 20 contest datasets, and then 
send our predicted results and the expected accuracies to the 
contest organizer. The results are calculated by the contest 
organizer and subsequently sent back to us. 

4.1 Datasets 

We use the datasets from the Workshop and Challenge on 
Time Series Classification, held in conjunction with the thir- 
teenth SIGKDD 2007 conference. The datasets are from 
very diverse domains (e.g., stock data, medical data, etc.); 
some are from real- world problems, and some are syntheti- 
cally generated. The amount of training data and its length 
in each dataset also vary from the size of 20 to 1000 train- 
ing instances and the length of 30 to 2000 data points. 
In addition, all data are individually normalized using Z- 
normalization. Examples of each dataset are shown in Fig- 
ure [SJ and the datasets' properties are shown in Table 1111 
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Figure 5: Some samples from each of the 20 datasets. 



Table 11: The dataset properties. 



Table 12: The predicted and test accuracies. 



D3.t3.sct 


-^-Cj].3SSeS 


TV 3. i n i n g 

tlcltcl Size 


Test 
clcitti size 


Length of e3ch 
time series 


1 


8 


55 


2345 


1024 


2 


2 


67 


1029 


24 


3 


2 


367 


1620 


512 


4 


2 


178 


1085 


512 


5 


4 


40 


1380 


1639 


6 


5 


155 


308 


1092 


7 


6 


25 


995 


398 


8 


10 


381 


760 


99 


9 


2 


20 


601 


70 


10 


2 


27 


953 


65 


11 


2 


23 


1139 


82 


12 


3 


1000 


8236 


1024 


13 


4 


16 


306 


345 


14 


2 


20 


1252 


84 


15 


3 


467 


3840 


166 


16 


2 


23 


861 


136 


17 


2 


73 


936 


405 


18 


7 


100 


550 


1882 


19 


12 


200 


2050 


131 


20 


15 


267 


638 


270 



D c\ t 3.set 


red lcted sccurscy 


Test &ccur&cy 


1 


0.9636 


0.6505 


2 


0.9403 


0.9161 


3 


0.4714 


0.3491 


4 


0.9494 


0.9231 


5 


0.9500 


0.8537 


6 


0.1871 


0.6099 


7 


1.0000 


0.8714 


8 


0.7428 


0.9346 


9 


0.9500 


0.8488 


10 


0.8519 


0.8507 


11 


0.9565 


0.8489 


12 


0.8570 


0.8353 


13 


0.9375 


0.7250 


14 


0.9000 


0.9276 


15 


0.6017 


0.4435 


16 


0.7391 


0.8645 


17 


0.8904 


0.9346 


18 


0.2900 


0.5049 


19 


0.9500 


0.9660 


20 


0.6966 


0.9275 



4.2 Results 

The predicted result is generated after running our algo- 
rithm to find the best R-K band within the competition's 
24-hour time constraint. More specifically, the predicted 
accuracy is calculated by computing leaving-one-out cross 
validation on the training dataset. Table [12] shows our pre- 
dicted accuracies and testing accuracies for all 20 datasets 
which are calculated and are returned to the contest orga- 
nizer. Because of the small number of training data, the 
predicted accuracy and the test accuracy are different in 
some cases. 

5. CONCLUSION 

In this work, we propose a new efficient time series classi- 
fication algorithm based on 1-Nearest Neighbor classifica- 
tion using the Dynamic Time Warping distance with multi 
R-K bands as a global constraint. To select the best R- 
K band, we use our two proposed learning algorithms, i.e., 
band boundary extraction algorithm and iterative learning. 
Silhouette index is used as a heuristic function for select- 
ing the band that yields the best prediction accuracy. The 
LBJKeogh lower bound is also used in data prediction step 
to speed up the computation. 
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